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Abstract 

We conducted a comprehensive assessment of genomic repeat content in two snake genomes, the venomous copperhead 
(Agkistrodon contortrix) and the Burmese python (Python molurus bivittatus). These two genomes are both relatively small 
(~1 .4 Gb) but have surprisingly extensive differences in the abundance and expansion histories of their repeat elements. In 
the python, the readily identifiable repeat element content is low (21%), similar to bird genomes, whereas that of the 
copperhead is higher (45%), similar to mammalian genomes. The copperhead's greater repeat content arises from the recent 
expansion of many different microsatellites and transposable element (TE) families, and the copperhead had 23-fold greater 
levels of TE-related transcripts than the python. This suggests the possibility that greater TE activity in the copperhead is 
ongoing. Expansion of CR1 LINEs in the copperhead genome has resulted in TE-mediated microsatellite expansion 
("microsatellite seeding") at a scale several orders of magnitude greater than previously observed in vertebrates. Snakes also 
appear to be prone to horizontal transfer of TEs, particularly in the copperhead lineage. The reason that the copperhead has 
such a small genome in the face of so much recent expansion of repeat elements remains an open question, although 
selective pressure related to extreme metabolic performance is an obvious candidate. TE activity can affect gene regulation as 
well as rates of recombination and gene duplication, and it is therefore possible that TE activity played a role in the evolution 
of major adaptations in snakes; some evidence suggests this may include the evolution of venom repertoires. 

Key words: Burmese python, copperhead, microsatellite seeding, non-avian reptile comparative genomics, transposable 
elements. 



Introduction 

Among vertebrates, the snake lineage represents an im- 
pressively speciose (~3 1 00 sp.) and phenotypically diverse 
radiation, and as a result, snakes have become increasingly 
important model systems for diverse research areas. 
Snakes provide a unique model system for studying 
extreme physiological remodeling and metabolic cycling 
(Secor and Diamond 1995, 1998) and in venom-related 
research (Fry et al. 2006; Ikeda et al. 2010). Snakes have 
also become important modelsfor developmental biology, 



evolutionary ecology, and molecular evolution and adap- 
tation (Cohn and Tickle 1 999; Fry et al. 2006; Castoe et al. 
2008, 2009b; Vonk et al. 2008). Despite the importance of 
snakes as models for basic and biomedical research, there 
is little known about the genomes of snakes and about 
reptile genomes in general (Shedlock et al. 2007; Janes 
et al. 2010). 

Our aim here was thus to obtain comprehensive 
sequence-based comparative insight into snake genomic 
diversity, particularly the diversity and structure of the 
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repetitive element landscape. Such insight is important 
because repetitive elements comprise major portions of 
vertebrate genomes and exert major influences over ge- 
nome evolution. Among repetitive sequences, transposable 
elements (TEs) in particular have had a tremendous impact 
on the structural and functional evolution of genes and ge- 
nomes. Numerous studies have documented howTE activity 
and ectopic recombination between TE copies promote 
small- and large-scale variation in the structure of genomes; 
such rearrangements provide a substrate for the emergence 
of new functional sequences, both coding and noncoding, 
including the birth of new protein-coding genes and the 
rewiring of regulatory networks (Feschotte 2008; Cordaux 
and Batzer 2009; Herpin et al. 2010). Despite a recent 
comprehensive review summarizing current knowledge 
about reptilian TEs (Kordis 2009), our understanding of 
vertebrate TE diversity and evolutionary dynamics remains 
largely dominated by perspectives from mammalian and 
to a lesser extent avian genomes. 

The speciose nature and evolutionary age of the snake 
radiation make it an excellent amniote lineage for compar- 
isons to mammals. Snakes and mammals share a common 
ancestor -310 Ma and snakes diverged from other squa- 
mate reptiles about -170 Ma (Castoe et al. 2009a), which 
slightly predates the estimated split of eutherian (placental) 
and metatherian (marsupial) mammals. In this study, we 
chose to focus on two fairly distantly related snake species, 
the Burmese python (Python molurus bivittatus) and the 
copperhead (Agkistrodon contortrix) — these two lineages 
share a common ancestor at about the same time as do 
all eutherian mammals, around 100 Ma (Castoe et al. 
2009a). In comparison to mammalian genomes, snake 
genomes are generally small (Gregory et al. 2007), ranging 
from 1.3 to 3.8 Gbp and averaging 2.1 Gbp, and the two 
snakes chosen both have similarly sized genomes, on the 
small side of this range (—1.4 Gbp). Evidence from early 
DNA reassociation studies suggests that there may be exten- 
sive variation in the genomic repeat landscapes among 
snake species, particularly between pythons and colubroid 
species such as the copperhead (Olmo et al. 1981; Olmo 
1984). These two snakes also represent important lineages 
for research. The Burmese python (P. molurus bivittatus) is 
an important model for physiological and metabolic adap- 
tation and the copperhead (A contortrix) is a model for 
metabolic adaptation and a viperid model for studies related 
to venom. Although distantly related, these two lineages 
(pythons and viperids) have convergently evolved extremely 
dynamic metabolisms to facilitate the infrequent consump- 
tion of large prey (Secor and Diamond 2000). 

To gain insight into the repeat landscapes of these two ge- 
nomes and the evolutionary processes that have shaped them, 
we obtained low-coverage 454 high-throughput sequencing 
data from genomic shotgun libraries, as well as 454 transcrip- 
tome sequence showing evidence of TE transcriptional activity 



in both species. Using this data, we analyzed TE and simple 
sequence repeat (SSR) content, diversity, and origins. The re- 
sults reveal extraordinary differences between these two 
snakes. They also contribute to a broader understanding of 
vertebrate genome evolution and diversity by beginning to 
show how snake genomes compare to one another and to 
other vertebrates. 

Materials and Methods 

Shotgun Library Creation and Sequencing 

Whole genome random shotgun libraries were made from 
two snake species, A. contortrix (the copperhead; from 
a Texas population) and P. molurus bivittatus (the Burmese 
python; obtained from the pet trade); animals and tissues 
were procured and processed through the Amphibian and 
Reptile Diversity Research Center at The University of Texas 
Arlington. Total DNA was prepared from liquid-nitrogen 
snap-frozen liver tissue by standard phenol-chloroform-iso- 
amyl alcohol extraction methods. 454 FLX-LR and 454 Tita- 
nium-XLR genomic shotgun libraries were prepared using the 
454 shotgun library preparation kit and protocol (Roche). Li- 
braries were sequenced on the Roche 454 FLX sequencing 
platform. From the Agkistrodon FLX-LR shotgun library, 
60.3 Mb from 280,303 sequence reads were collected using 
Roche/454 FLX-LR sequencing kits, amounting to about 
4.5% of the estimated 1.35 Gbp (Gregory et al. 2007) ge- 
nome (supplementary tables S1 , Supplementary Material on- 
line). Two libraries from the same individual (one FLX and one 
Titanium) were sequenced for the Python; from the FLX-LR 
library, we sequenced 61,256 reads, totaling 13.3 Mbp, 
and from the Titanium-XLR library, we sequenced 57,717 
reads, totaling 15.2 Mbp. In sum, 28.5 Mbp of Python se- 
quence from 118,973 reads were collected, representing 
—2.0% of its estimated 1.42 Gbp genome (based on esti- 
mates of the related Python reticulatus genome, supplemen- 
tary tables S1, Supplementary Material online). Comparisons 
of repeat annotations of FLX-LR and FLX-XLR data for Python 
indicated extremely little difference between the two data 
types (supplementary fig. S10, Supplementary Material 
online). Sequence reads with similarity to the mitochondrial 
genome were filtered out prior to analyses (see supplemen- 
tary methods, Supplementary Material online). 

Repeat Analyses 

We used the current release of the Tetrapoda RepBase 
(version 12.12, 17 January 2008) as the repeat library 
for RepeatMasker (Smit et al. 2010) to identify known re- 
peat elements in the snake genomes. For comparisons, we 
also ran RepeatMasker on —50 Mbp of the Anolis genome 
(AnoCarl.O) from four combined genome scaffolds. For 
SSR analysis, we used a previously written Perl script 
(Castoe et al. 2010) that was modified to identify SSR loci 
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with repeated sequence motifs of 2-6 bases in length and 
a minimum of 1 2 bases in length (for 2-4mers) or contain- 
ing 3 or more tandem repeats (for 5-6mers). We used the 
program RepeatModeler (Smit A, unpublished data) to 
identify de novo repeat sequences in our snake data sets, 
based on the run parameters suggested as defaults by the 
program. The approach essentially couples two de novo 
repeat finding methods, RECON (Bao and Eddy 2002) 
and RepeatScout (Price et al. 2005), together with Tandem 
Repeat Finder (Benson 1 999). We modified RepeatModel- 
er's RepeatMasker parameters to specify the Tetrapoda li- 
brary. For all RepeatModeler analyses, we combined the 
new Python and Agkistrodon libraries into a single joint 
snake library to recover as many elements as possible 
and control for differences in sequencing depth. Consen- 
sus sequences from RepeatModeler were classified using 
RepClass (Feschotte et al. 2009). By identifying novel 
"families" that hit the same known TE family, we were able 
to reduce the original count of new family consensus se- 
quences by 1 8.6% and 20.3% in Python and Agkistrodon, 
respectively (see supplementary methods and supplemen- 
tary tables S2 and S3, Supplementary Material online). We 
also used the program P-clouds for identifying de novo 
repeats, with the following parameter settings: 2, 3, 6, 
1 2, and 24 for low, core, step-1 , step-2, and step-3 cutoffs 
(Gu et al. 2008). Details provided in supplementary meth- 
ods (Supplementary Material online). 

Consensus sequences for select TEs were assembled in 
an iterative fashion, using successive rounds of mapping 
reads to existing element sequences to extend element 
alignments and then taking the consensus of these align- 
ments. Mapping was conducted using gsmapper software 
(Roche) and used to estimate read coverage across the 
length of elements. Forestimation of sequence divergence 
from element consensus sequences, sequences were 
aligned using POA (Lee et al. 2002). In the case of LINEs, 
for which we expected multiple subfamilies to exist, se- 
quence divergence was estimated excluding sites inferred 
to define element subfamilies (see supplementary meth- 
ods, Supplementary Material online), similar to (Aleshin 
and Zhi 2010). 

cDNA Sequencing and Analysis 

RNA was extracted, poly-A enriched, and cDNA libraries 
were prepared from A. contortrix and R molurus liver 
tissue samples using standard techniques. Libraries were bi- 
directionally sequenced using the FLX-LR reagents on the 
454 FLX instrument. All steps were carried out on both sam- 
ples, side-by-side, from RNA extraction through sequencing. 
Transcript contigs were assembled using the 454 GSAssem- 
bler, and we searched for TE sequences using our snake- 
specific libraries in RepeatMasker. Details provided in 
supplementary methods (Supplementary Material online). 



Results 

The 60.3 Mbp sequenced from the A contortrix (copperhead 
hereafter, for readability in the text) random shotgun sequence 
library amounts to about 4.5% of its estimated 1.35 Gbp 
genome, whereas the 28. 5 M bp sequenced from the P. molurus 
(python hereafter) represents about 2.0% of its estimated 1 .42 
Gbpgenome(supplementarytableS1, Supplementary Material 
online; NCBI Sequence Read Archive accession SRA029568.1 ; 
also available at www.snakegenomics.org/SnakeGenomics/ 
RawData. html). Distributions of base call quality scores are very 
similar for both species, facilitating direct comparisons between 
the data for both species (supplementary fig. S1 , Supplemen- 
tary Material online). To provideabaselineestimateforexpected 
levels of neutral sequence divergence among the python, 
copperhead, and anole, we analyzed a previously published 
12 nuclear protein-coding gene data set (Wiens et al. 2010). 
We estimated an average synonymous divergence (i.e., dS 
branch lengths; see supplementary methods, Supplementary 
Material online) of 0.709 substitutions/site between the anole 
and the snakes and 0.221 substitutions/site between the 
python and copperhead. 

To obtain a preliminary understanding of repetitive con- 
tent in these genomes, we first analyzed the frequencies 
of 15mers in 28 Mbp samples from both genomes (con- 
sisting of a random 28 Mbp sample from the copperhead 
and all of the python data). We chose 1 5mers because, by 
chance, any one 15mer should occur only about once in 
a genome of this size, making high-copy 1 5mers extremely 
infrequent by chance and thus indicative of repetitive 
elements (Gu et al. 2008). Both species contained similar 
amounts of single copy 1 5mers (1 5,364,028 in the python 
and 17,186,377 in the copperhead), but the python had 
more low-to-moderate copy number 15mers (i.e., 2-10 
copies; fig. 1/\). In contrast, the copperhead had more 
high-copy 15mers (fig. ^A) I suggesting that it has more 
highly similar (recently expanded) repeat elements, 
whereas python repeat elements are comparatively older 
and/or fewer in number. Analysis of the anole lizard (Anolis 
carolinensis) genome revealed a 15mer profile similar to 
the copperhead (fig. 1), suggesting that it too has a sub- 
stantial number of recently expanded repeats, and thus 
a repetitive landscape more similar to the copperhead than 
the python. 

Identification of Interspersed and Tandem Repeats 

The common method of identifying recognizable repeat el- 
ements is to scan sequences using a library of known repeat 
element consensus sequences (e.g., RepBase; Jurka 2000) 
with sequence similarity-based algorithms such as Repeat- 
Masker (Smit et al. 2010). Such homology-based methods 
cannot recognize elements not in the database and have 
low power to identify repeat element fragments, even up 
to 200 bp, from moderately diverged repeat families (de 
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Fig. 1. — Comparison of repeat analyses. (A) The frequency of each 
different 15mer sequence was counted, and the number shown is the 
number of different 15mers having a particular count. Equal size 
samples of the genomes of the two snakes, copperhead (Agkistrodon) 
and python (Python), and the lizard (Anolis) were considered. SSRs were 
removed from the analyzed data using RepeatMasker/RepBase. (B) The 
repeat annotation methods (P-clouds and RepeatModeler/ 'Repeat- 
Masker) in the two snakes were compared with determine the percent 
of the genome (in nucleotides) that was masked by either method 
alone, both methods, or neither method (i.e., remained unannotated). 
(O The size distribution of P-clouds results in the two snakes are shown. 

Koning APJ, Gu W, Castoe TA, Pollock DD, unpublished 
data). For this reason, the identification of interspersed 
and tandem repeats in snakes is problematic because there 
are no closely related organisms in which repeat elements 
have been studied in-depth, and thus there are no snake- 
specific repeat libraries available. Because unassembled 
next-generation sequence reads likely contain many (per- 
haps mostly) TE sequence fragments, we utilized a three- 
pronged strategy to evaluate repeat content and maximize 
detection and classification of repeat elements: first, we ap- 
plied the P-clouds method, which can estimate the repeat 
fraction without knowing a priori what the repeat elements 
are and which is relatively powerful at identifying short re- 
peat fragments (Gu et al. 2008); second, we utilized the 
"Tetrapoda" repeat consensus library in RepBase to detect 
similar sequences by homology searching; and third, we 
used RepeatModeler (Smith A, unpublished data) to identify 



new snake-specific repeat element clusters (families). We 
analyzed the output of RepeatModeler with RepClass 
(Feschotte et al. 2009), a tool that automates the classifica- 
tion of newly discovered TEs (supplementary figs. S2 and S3, 
Supplementary Material online). Repeat family consensus 
sequence libraries identified are available at www. 
snakegenomics.org/SnakeGenomics/Processed_Data.html. 

Previous analyses using the P-douds method have esti- 
mated genomic repetitive content in repeat-poor bird ge- 
nomes at around 40% (Warren et al. 2010), in more 
repeat-rich genomes of the Anolis lizard at 73.7% (Warren 
et al. 201 0), and in the human and panda genomes at about 
70% (Li etal. 2010). In comparison, the P-clouds estimate of 
the repetitive content of the python genome is 39.8%, sim- 
ilar to bird genomes, whereas the predicted copperhead re- 
petitive content is 55.2%, intermediate between birds and 
mammals/lizards (supplementary table S1, Supplementary 
Material online). 

Only 4.48% of the python and 1 1 .81 % of the copper- 
head (supplementary table S1, Supplementary Material on- 
line) were identified as readily classifiable repeats (SSRs, 
tandem repeats, low-complexity sequences, and known 
TEs from RepBase). An additional 16.73% of the python 
and 32.77% of the copperhead were identified as repeat 
elements using the newly identified snake-specific repeat li- 
brary from RepeatModeler (fig. 1 B and supplementary figs. 
S2 and S3, Supplementary Material online), and about one- 
third of these new families identified in both the copperhead 
and the python were classified by RepClass into known TE 
classes (655/1996 and 203/571, respectively; supplemen- 
tary tables S2-S7, Supplementary Material online). It is im- 
portant to note that the same snake-specific repeat library 
was used to annotate both species and was derived from 
running RepeatModeler on data from each species indepen- 
dently and then combining the resulting libraries; this ap- 
proach was designed to increase sensitivity and decrease 
bias due to different amounts of genomic sampling in the 
two species. 

All methods agree that the copperhead has considerably 
more detectable repetitive sequence than the python, and 
a large part of this repetitive fraction in both species arises 
from recognizable TEs. Altogether about 45% of the 
copperhead was annotated by the homology-based meth- 
ods compared with only 21% of the python genome; in 
contrast, the two estimates from P-clouds are 55% and 
40%, respectively (fig. IB and supplementary table S1, 
Supplementary Material online). We expect P-clouds to 
be more sensitive than homology-based methods for 
identifying more divergent and fragmented repeat 
elements, and the observation that the estimated python 
repetitive content is nearly twice as high based on P-clouds 
analysis indicates that much of its repetitive content may be 
older and/or more fragmented than that of the copperhead. 
There is substantial overlap between the methods (fig. ^B) I 
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Fig. 2. — Comparison of the TE and simple repeat content in copperhead (Agkistrodon) and python (Python) genomes. TE families were 
determined based on the combined annotations of Repbase, RepeatScout, RepeatModeler, and RepClass, and coverage in the genome was annotated 
using RepeatMasker. 



and most (—67-76%) of the regions uniquely annotated by 
P-douds are fairly long (>50 bp; fig. 10- The P-douds- 
unique sequences may also include non-TE-derived sequen- 
ces such as tandem duplications or large multigene families 
but given the level of sampling (<5% of each genome) and 
the makeup of known complete genomes, we expect that 
these types of repeats make up a fairly small percentage of 
the P-douds annotation. These results support the conclu- 
sion from the 15mer profile analysis that the copperhead 
has many homogeneous TE sequences compared with 
the more diverged and/or lower frequency repeats in the 
python. 

TE Landscapes 

The joint annotation of previously known RepBase ele- 
ments, together with newly identified elements classified 
using RepClass (Feschotte etal. 2009), revealed a substantial 
diversity of TEs in snake genomes (fig. 2 and supplementary 
fig. S1 , Supplementary Material online). Almost all types of 
repeats appear to occur more frequently in the copperhead 
than the python (fig. 2 and supplementary tables S6 and S7, 
Supplementary Material online), but the breadth of diversity 
in each of the snakes was similar, with most subclasses and 
superfamilies (e.g., Bov-B LINEs, DIRS1) found in the copper- 
head are also represented in the python (fig. 2). Although 
repetitive elements in the Anolis lizard genome have not 
yet been thoroughly annotated, the diversity of repeat types 
observed in the snakes was broadly similar to preliminary 
estimates of the diversity of repeat elements in the anole 
lizard genome (supplementary fig. S4 and table S8, Supple- 
mentary Material online). Comparing the two snakes, there 
are substantial differences in TE abundance (fig. 2; supple- 
mentary table S8, Supplementary Material online). The 



greatest difference lies in the abundance of CR1 LINEs that 
are more than four times as frequent in the copperhead, in 
contrast to Bov-B LINEs, which are similarly abundant in both 
genomes (fig. 2). 

For in-depth analysis of snake LINEs, we manually assem- 
bled consensus sequences of Bov-B and CR1 LINEs for each 
species (now available in RepBase). Analysis of the distribu- 
tion of sequence divergence (from species-specific consen- 
sus sequences, excluding subfamily-defining sites) within 
these two LINE superfamilies reveals contrasting expansion 
histories both between LINEs and between species (fig. 3A 
and B). Both snake lineages experienced recent (and likely 
independent) expansion of Bov-B LINEs, indicated by the 
low sequence divergence among these LINEs within each 
species (fig 3A). The bulk of divergence within python 
Bov-B LINEs appears to have occurred slightly more recently 
than for the copperhead. In contrast, CR1 accumulation ap- 
pears to have occurred over an extended period in both lin- 
eages and likely in the ancestor of both of these species. We 
estimated 1 1 % neutral divergence at synonymous sites of 
protein-coding genes above (0.221 substitutions per site 
pairwise difference between species, divided by 2). The find- 
ing that a notable proportion of CR1 elements in the snakes 
exceed 1 1 % divergence suggests that CR1s have been ac- 
tive over a long time period in snake genomes, including 
being active in the ancestor of these two snakes. The ex- 
tended time period of CR1 activity in both snakes, followed 
by a recent decrease in activity, contrasts sharply with the 
timing of Bov-B activity, which shows predominantly recent 
activity in both snakes (fig. 3). Also, despite a similar age 
distribution of CR1 elements in both snakes, the copper- 
head lineage accumulated many more CR1s than the py- 
thon lineage in every time period (fig. 3B). 
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Fig. 3. — Sequence divergence of selected TEs. The species-specific 
consensus sequences were determined for (A) Bov-B LINEs and (B) CR1 
LINEs, and the sequence divergence levels were calculated for all 
alignable sequences of these types, excluding sites that appeared to 
define subfamilies. 



Evidence from LINEs suggests that there may be substan- 
tially different genomic processes at work on the two snake 
genomes that results in truncation and possible purging of 
longer repeat elements along the copperhead lineage. 
Whereas about half of Bov-B LINEs appear near full length 
in the python genome based on sequence coverage in reads 
mapped to the python Bov-B LINE consensus sequence 
(fig. 4>4 and supplementary fig. SI OA, Supplementary 
Material online), a vast majority of Bov-B LINEs appear to 
be truncated in the copperhead genome (in which the 3- 
prime end of Bov-B is vastly overrepresented). Copperhead 
CR1 LINEs are truncated similarly to copperhead Bov-B 
LINES, with the last 600-800 bp of the element greatly over- 
represented in the sampled sequences, although the low 
copy number of CR1 LINEs in the python prevents meaning- 
ful comparisons between species (fig. A-B and supplemen- 
tary fig. S10E, Supplementary Material online). 

In addition to a greater abundance of LINEs, the copper- 
head also has a much greater abundance of both Gypsy-like 
(2.18% vs. 0.21%) and DIRS (0.84% vs. 0.03%) retrotrans- 
posons compared with the python; these families are also 
both more abundant in the python than in the anole lizard 
(fig. 2 and supplementary table S8, Supplementary Material 




# ^ ^ r 

Nucleotide position along consensus sequence 

Fig. 4. — Depth of sequence coverage. The coverage depth per 
megabase of genomic sequence is shown for the 3' ends of (A) Bov-B 
and (B) snakel CR1 LINEs for both the copperhead (Agkistrodon) and 
the python (Python) genome samples. 

online). An increased abundance of DNA transposons in the 
copperhead (4.58% vs. 1.92% in the python) is also ob- 
served, primarily due to increases in hobo-Activator-Tam3 
(hAT) transposons (fig. 2). The copperhead also experienced 
a notable expansion of SSRs and low-complexity regions rel- 
ative to the python and lizard and contains more unclassified 
elements than the python (1 6.45% vs. 9.29%). The rarity of 
identified SINEs may be an artifact because SINEs are often 
difficult to classify due to their short length, rapid evolution, 
and turnover and because they do not encode proteins. 

At the family level, over three times more new snake- 
specific families were identified (using RepeatModeler) in 
the copperhead (1 ,996) than the python (571 ). This de novo 
repeat identification method produces many potentially 
redundant family descriptions, but after collapsing redun- 
dant families (see supplementary methods, Supplementary 
Material online), the result holds; only 82 new collapsed fam- 
ilies from the python were identified by RepClass compared 
with 243 in the copperhead (supplementary tables S2-S5, 
Supplementary Material online). Many more element fam- 
ilies were identified in the copperhead compared with the 
python for numerous element types (supplementary table 
S4, Supplementary Material online), including CR1 and L1 
LINEs, penelope retrotransposons, gypsy and DIRS retro- 
transposons, and hobo-Activator (hAt) and Mariner DNA 
transposons (supplementary table S4, Supplementary Ma- 
terial online). The familial diversity of DNA transposons (1 6 
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new python families, 48 new copperhead families), DIRS (0 
new python families, 38 new copperhead families), and 
gypsy (8 new python families, 49 new copperhead families) 
are particularly skewed (supplementary table S4, Supple- 
mentary Material online). This indicates that the greater 
TE content in the copperhead compared with the python 
is based not only on greater numbers of elements but also 
on greater element diversity at the more fine-scale family 
level. Higher element abundance in the copperhead was 
not limited to a particular set of elements but rather distrib- 
uted across a diverse set of elements. Furthermore, per el- 
ement type, there tend to be more new element families/ 
subfamilies identified in the copperhead; this is especially 
the case for more frequent element types. 
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Fig. 5. — Sequence divergence between SPIN DNA transposon 
(MITE) sequences in the copperhead genome. Divergences from the 
consensus sequence were calculated as in figure 3. 



Evidence for Horizontal Transfer of TEs 

Previous studies have inferred horizontal transfer of Bov-B 
LINEs between mammals and snakes and/or squamate rep- 
tiles to explain the enigmatic distribution of these elements 
across amniote vertebrates (Kordis and Gubensek 1997, 
1998b). Based on phylogenetic analysis of Bov-B sequences 
from available vertebrate genomes, we estimate that cop- 
perhead and anole Bov-B sequences are more closely related 
to each other than either sequence is to the python Bov-B 
sequences. In terms of organismal phylogeny, snakes are 
uniformly believed to form a monophyletic group to the ex- 
clusion of lizards (Townsend et al. 2004; Vidal and Hedges 
2005; Castoe et al. 2009b; Wiens et al. 2010). Therefore, 
our inference that there are multiple lineages of snake 
Bov-B elements implies multiple episodes of horizontal 
transfer of Bov-B LINEs to or from squamate reptiles (supple- 
mentary fig. S5, Supplementary Material online). We also 
found traces of Maverick DNA transposons only in the py- 
thon (supplementary tables S6 and S7, Supplementary Ma- 
terial online), and these are not otherwise known from 
squamate reptiles, although there is a report of one from 
the sister lineage of squamates, the tuatara (Pritham 
et al. 2007). 

Hobo-Activator-Tam3 (hAT) DNA transposons are barely 
detectable in the python but comprise -2% of the copper- 
head genome sample (fig. 2 and supplementary table S8, 
Supplementary Material online). Space invader (SPIN) ele- 
ments, a type of hAT DNA transposon, are known to have 
been independently horizontally transferred into the ge- 
nomes of multiple tetrapod lineages within the last 15- 
46 My, including that of the anole lizard (Pace et al. 
2008; Novick et al. 201 0). We found evidence of numerous 
SPINs in the genome of the copperhead (fig. 5) but found 
none in the python (corroborated by polymerase chain re- 
action [PCR] and Southern hybridizations; Feschotte C, un- 
published data). In the copperhead, most SPIN-related 
sequences (1 , 1 42) found were MITEs (proximal ends) of SPIN 
elements representing deletion derivatives of longer and 



presumably autonomous elements. An additional 19 reads 
mapped to (non-MITE) internal regions of the anole lizard 
SPIN transposon consensus sequence (Pace et al. 2008). 
SPIN MITE sequences from the copperhead display relatively 
low levels of sequence divergence (from a copperhead SPIN 
consensus), averaging around 6% (fig. 5). This is consistent 
with recent activity and invasion of SPINs into the copper- 
head genome at a similar time frame (<45 Ma) as SPINs ap- 
pear to have invaded other tetrapod lineages, long after the 
~ 1 00 Ma split between the python and copperhead (Castoe 
et al. 2009a), although the lack of a known neutral substi- 
tution rate for squamate genomes precludes precise dating. 

SSR Structure 

The frequency pattern of SSRs in snakes is similar to the fre- 
quency pattern of repetitive elements in that the copper- 
head has about four times the SSR content of the python 
and 2 or more times that of the anole lizard, which itself 
has substantially more than other reptiles or birds examined 
(fig. 6/\). As with the TEs, it is surprising to observe such an 
expansion in a genome that is smaller than most other rep- 
tile genomes. Although the number of SSRs identified varies 
somewhat depending on the identification method (e.g., 
fig. 6A and supplementary fig. S5 and table S7, Supplemen- 
tary Material online), the approximate proportions remain 
similar. The relative abundance of SSR loci in the copper- 
head, compared with the python and the anole, is also con- 
sistently higher across all repeat motif length classes, from 
2mers to 6mers (supplementary fig. S6, Supplementary Ma- 
terial online). This points to a general expansion of SSR loci. 
The excessive relative enrichment of 4mers and 5mers in the 
copperhead, however, indicates a possible role for a motif- 
specific mechanism as well. 

SSR motif sequence frequencies are quite similar be- 
tween the python and anole lizard and surprisingly different 
in the copperhead, suggesting accelerated evolution of SSRs 
along the linage leading to the copperhead (supplementary 
figs. S6 and S7, Supplementary Material online). Due to 
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Fig. 6. — Expansion of specific SSR motifs in the copperhead genome. {A) The number of SSR loci per megabase for a sampling of amniote 
genomes is shown along with a phylogenetic tree of their relationships. Estimates for squamate reptiles based on RepeatMasker analyses; estimates for 
non-squamates taken from Shedlock et al. (2007). (B) The 3mer and 5mer SSR loci of python {Python) and copperhead {Agkistrodon) are shown sorted 
first by SSR sequence motif and then by SSR length (in base pairs). The height of each bar corresponds to the length of each SSR (in base pairs), and the 
width is proportional to the identified number of sequences with a particular motif and length. The width of the portion of the graph devoted to each 
motif is proportional to the motif's relative abundance among SSRs (in terms of number of loci). The regions of the graph devoted to motifs ATA and 
AATAG are indicated with double arrows. (O Two alternative SSR tails at the 5' ends of snakel CR1 LINEs are shown along with the estimated copy 
number of this LINE family in the two snakes. 



common descent, the frequencies of SSR motifs in different 
species are expected to be correlated, and under the as- 
sumption of a constant rate of evolution (birth and death) 
of SSRs, the degree of correlation should decrease with the 
divergence time between species. The SSR motif-specific 
frequency profiles in the anole and python have a linear re- 
gression coefficient of R 2 = 0.716 compared with R 2 = 
0.405 between the two snakes. The contrasting correlation 
strengths were particularly strong for comparisons of 2mers, 
4mers, and 5mers (supplementary fig. S6, Supplementary 
Material online). These results are consistent with the hy- 
pothesis that SSR motifs are typically stable for long periods 



of time (e.g., the time separating the python and the anole), 
but that the copperhead lineage has undergone an unusual 
amount of SSR turnover resulting in a major change in the 
SSR motif frequencies and overall abundances in the copper- 
head genome. 

Microsatellite Seeding by CR1 LINEs 

Certain SSR sequence motifs were greatly expanded in the 
copperhead, including ATA, ATAG, AATAG (fig. 6B and sup- 
plementary fig. S6, Supplementary Material online), which 
are notably similar to one another. Analysis of the flanking 
sequences of these highly expanded copperhead SSR loci 
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Fig. 7. — .Relative frequencies of TEs in liver cDNA transcripts. Relative transcript frequencies in the two snakes are shown in a radar graph on 
a logarithmic scale. Sequences shown had long regions of high similarity to known TEs. 



showed that the 3mer (ATA) n and 5mer (AATAG) n were as- 
sociated with other non-SSR repetitive sequences: these SSR 
motifs are found in the 3' tails of a snake-specific family of 
CR1 LINEs that we refer to as snakel CR1 LINEs (fig. 6C). 
Snakel CR1s are found at low levels in the python sample 
(196 element fragments; —7 element fragments/Mbp) but 
found -25 times more often in the copperhead sample 
(11,324 element fragments; —189 element fragments/ 
Mbp); because the number of times an element was sam- 
pled multiple times by different fragments in these samples 
is expected to be negligible, we therefore estimate there to 
be —10,000 snakel CR1 LINEs in the python genome versus 
—254,000 in the copperhead (fig. 60- In the copperhead, 
41 .4% of all ATA SSR loci and 22.7% of all AATAG SSR loci 
were flanked by readily identifiable snakel CR1 LINEs; only 
a small fraction of ATAG SSRs were flanked by snakel CR1 
LINEs (0.9%), and it is thus unclear whether LINEs directly 
seed these ATAG repeats or if they are mutated versions 
of related 3mers or 5mers. The most extreme perturbations 
in microsatellite frequencies between the two snakes thus 
seem to be due to the "microsatellite-seeding" (Arcot 
etal. 1995) activity of these snakel CR1 LINEs. 

The elements that we are calling snakel CR1 LINEs have 
been identified previously (Nobuhisa et al. 1 998; Fujimi et al. 
2002; Ikeda et al. 2010) but were conflated with Bov-B LINEs 
because of a misannotation of a novel Bov-B LINE that was 
actually a Bov-B LINE flanked by two snakel CR1 LINE frag- 
ments (supplementary fig. S9, Supplementary Material on- 
line). Snakel CR1 LINEs are also notable because our data 
confirm previous speculation that they occur at high- 
frequency throughout phospholipase venom genes in viperid 
snakes (Ikeda et al. 2010), numerous other venom genes in 
viperids and elapids (based on Blast analysis; supplementary 



table S1 0, Supplementary Material online), and in HOX gene 
clusters of colubrid snakes (Di-Poi et al. 2010). Although it 
would be ideal to test for significant enrichment of CR1 el- 
ements adjacent to venom genes, this is currently not pos- 
sible because the lack of diversity of available sequences 
for venomous snake genomes that include both genie 
and intergenic annotated regions. 

We also found that snake Bov-B LINEs tend to have 
a (CAA) n microsatellite repeat at their 3-prime end and thus 
appear to be capable of seeding (CAA) n microsatellites. De- 
spite this, we do not find evidence for any substantial expan- 
sion of (CAA) n SSRs in the genomes of either snake species 
surveyed here (supplementary fig. S6, Supplementary Mate- 
rial online). It is also notable that even when all known LINE- 
associated SSR motifs are excluded from consideration, 
there are still large differences in SSR motif abundance be- 
tween the snakes (e.g., supplementary figs. S6-S8, Supple- 
mentary Material online). This implies that in addition to the 
major impact of LINEs, other LINE-independent effects have 
altered SSR abundances in the lineage leading to the cop- 
perhead. 

Evidence of TE Transcriptional Activity 

Transcription levels in liver tissue samples from both species 
were evaluated to determine whether TE elements are actively 
transcribed in living snake tissues (NCBI Sequence Read Ar- 
chive accession SRA029568.1; also available at www. 
snakegenomics.org/SnakeGenomics/Raw_Data.html). Tran- 
scripts with sequence homology to every TE class were more 
frequent in the copperhead liver transcriptome (fig. 7 and sup- 
plementary table S1 1 , Supplementary Material online), with 
23-fold greater overall levels of transcription of TEs in the 
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copperhead compared with the python, including many dif- 
ferent TE classes that were inferred to be recently active from 
the genomic data. For example, LINEs in the copperhead rep- 
resent —4.6% of all transcripts (47 -fold more than in the py- 
thon). In addition, CR1s are particularly frequent in the 
copperhead, comprising —3% of the copperhead transcrip- 
tome sample; this frequency is 122-fold greater than in the 
python transcripts (fig. 7 and supplementary table S1 1, Sup- 
plementary Material online). Bov-B LINEs were also observed 
in both species but were 16-fold more abundant in the cop- 
perhead (at 0.8% of transcripts; supplementary table S1 1 , 
Supplementary Material online). From the genomic data, 
we inferred that Gypsy and DIRS1 LTR retroelements had ex- 
panded recently in the copperhead, and transcriptional data 
show moderately high levels of both of these in the copper- 
head (at 0.54% and 0.18% of transcript reads, respectively) 
yet these were either barely detected or not detected at all in 
the python transcriptome (fig. 7). We also found transcrip- 
tional evidence of hAT DNA transposon activity (which in- 
cludes SPIN element activity) in the copperhead (0.64% of 
reads) at 280-fold greater levels than the python. The highest 
abundance of presumably TE-related transcripts was those 
that were "unclassified" TEs; we found 28-fold greater relative 
abundance of unclassified repeats in the copperhead (5.6% of 
reads) versus the python (0.2% of reads; fig. 7). Although we 
cannot interpret exactly what these unclassified elements rep- 
resent, we expect this category to contain a substantial pro- 
portion of SINEs. Although the liver is not where TEs need to 
be expressed to make new inherited copies, and these tran- 
scripts do not necessarily arise from the TE's own promoters, 
this data suggest the possibility that TE activity in the copper- 
head continues to be high compared with the python. 

Discussion 

Comparison of two snake genomes, spanning ~1 00 My of 
snake evolution, revealed extensive differences in their ge- 
nomic repeat landscapes. Although both snakes contain di- 
verse sets of repeat elements distributed across most major 
element types and superfamilies, the copperhead genome 
contains more of essentially all of these repeats (occupying 
45% of the copperhead genome vs. 21 % of the python ge- 
nome), and many repeats have expanded recently. In com- 
parison, the largest known difference in genomic repeat 
content between placental mammalian genomes occurs be- 
tween the human and the mouse (46% vs. 38%, respec- 
tively), which are separated by —75 My (Waterston et al. 
2002). Thus, for similar levels of temporal divergence, the 
difference in repeat content in these two snakes is excep- 
tional. Furthermore, the greater repetitive content in the 
copperhead is not due to one or a few expanded repeat 
families but is distributed among a diversity of element fam- 
ilies and subfamilies (243 collapsed TE families in the cop- 
perhead vs. 82 in the python). 



TE-related transcripts appear to be expressed at much 
higher levels in copperhead tissue compared with python 
tissue, even when the greater genomic TE abundances in 
the copperhead genome are accounted for. Although we 
surveyed liver rather than gametic tissues for TE activity, 
the 23-fold greater overall levels of TE-related transcripts 
in the copperhead than in the python (fig. 7) suggests that 
TE transcription may be generally more active in the copper- 
head. This is true even in the case of CR1 , for which there are 
25 times more elements in the copperhead than in the py- 
thon genome; there are 122 times more CR1 -related tran- 
scripts in copperhead tissues than in python tissues (fig. 7 
and supplementary table S1 1, Supplementary Material on- 
line). If transcription levels have also increased in germ line 
tissues, they may have contributed to increased genomic TE 
insertion activity. One hypothesis, to explain the observa- 
tions that TEs have higher transcription levels and have been 
more active in the copperhead versus python genomes, is 
that mechanisms known to control TE proliferation (e.g., 
CpG methylation and chromatin structural regulation; 
Yoder et al. 1997; Lippman et al. 2004; Feschotte 2008) 
may be differentially effective in the two snakes. It is also 
possible that TEs may occur in greater proximity to transcrip- 
tional units in the copperhead genome, driving greater lev- 
els of read-through transcription. It is unclear, however, 
what mutational or selective force would have made TEs 
land and become fixed nearer transcriptional units in the 
copperhead than in the python. The increased transcription 
levels in the copperhead also suggest that TEs are more likely 
to influence flanking gene expression in the copperhead 
than in the python. 

Prior to the present study, at least two plausible instan- 
ces of horizontal transfer implicating snake TEs have been 
reported, involving Bov-B LINEs (Kordis and Gubensek 
1998b) and Sauria SINEs (Piskurek and Okada 2007). This 
study provides novel evidence for additional horizontal 
transfer of TEs and by adding genomic data from two 
snake species in addition to the anole lizard, it provides 
the first large-scale comparative view into TE dynamics 
within squamate reptiles. Together, our sequence-based 
data and PCR-based confirmation of the absence of SPIN 
elements in the python (Feschotte C, unpublished data), in 
contrast to the abundance of recently inserted SPIN ele- 
ments in the copperhead, provide compelling new evi- 
dence that, as with mammalian genomes (Pace et al. 
2008; Gilbert et al. 2010), reptilian genomes have been 
differentially invaded by these elements. Although previ- 
ous studies have already suggested horizontal transfer of 
Bov-B LINEs between squamate reptiles and mammals 
(Kordis and Gubensek 1 997, 1 998a, 1 998b), our analysis 
suggests the possibility of multiple transfer events into 
and/or out of squamate genomes (supplementary fig. 
S5, Supplementary Material online). The previous report 
of a poxvirus-mediated transfer of Squaml SINE elements 
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from viperid snakes to rodents demonstrates that viruses 
may sometimes mediate such horizontal transfer events 
(Piskurek and Okada 2007). This transfer is thought to 
be dependent on the enzymatic machinery of a Bov-B LINE 
(Piskurek and Okada 2007), and high transcript levels of 
Bov-B reverse transcriptase in snake tissues, such as those 
found in the copperhead, may thus increase the probabil- 
ity of horizontal transfer events. 

The copperhead lineage also appears to have modified 
microsatellite evolutionary dynamics, including microsa- 
tellite seeding (Arcot et al. 1995; Tay et al. 2010) by 
a snake-specific CR1 LINE family (fig. 6). Our analyses show 
that a high percentage of expanded microsatellite motifs 
were adjacent to readily identifiable snakel CR1 LINEs 
(41 .4% of all ATA and 22.7% of all AATAG SSR loci). These 
findings suggest that microsatellite seeding by these LINEs 
in the copperhead has occurred at a scale that is several 
orders or magnitude greater than any other example that 
we are aware of (Nadir et al. 1996; Tay et al. 2010). The 
similar SSR motif frequencies between the python and 
anole lizard are consistent with previous suggestions that 
SSR evolution and turnover rates in non-avian reptiles are 
generally lower than in mammals (Matsubara et al. 2006; 
Shedlock et al. 2007). In contrast, the increase in SSR con- 
tent and radically different motif frequencies in copper- 
head indicate that SSR turnover rates in squamates can 
evolve even more rapidly than what is known from mam- 
malian genomes. 

Despite its substantial and recently expanded repeat 
content, the copperhead has a genome size that is among 
the smallest of snakes. This is surprising, as it is reasonable 
to expect that small genomes should have low repetitive 
content, as is the case in pythons and birds. We suggest 
that unidentified processes must be acting differentially 
in the copperhead to remove genomic sequence, poten- 
tially due to mechanistic differences in the biology of 
the two snake lineages and/or differences in selection 
on mutations that alter genome size. Further evidence 
of differential processes operating in these two snake lin- 
eages comes from our observation of differences in the rel- 
ative abundance of 3' truncated LINEs between species 
(fig. 4). The excess of short LINEs in the copperhead is con- 
sistent with pressure to limit genome expansion, although 
3' LINE truncation has not been sufficient to balance the 
genome size equation (the total LINE element sequence is 
still considerably greater in the copperhead). The relative 
bias toward short elements in the copperhead could be 
caused by a greatertendency to generate shorter elements 
(at the time of insertion), a greater probability to fix shorter 
elements, and/or a greater probability to delete long ele- 
ments at any time via ectopic recombination, which has 
been proposed to occur in the anole lizard genome (Novick 
et al. 2009). Selection could be more effective in the cop- 
perhead than the python due to a larger effective popu- 



lation size rather than stronger selection against 
genome expansion, but this seems unlikely. It is expected 
that the Nearctic-distributed copperhead lineage suffered 
small effective population sizes due to bottlenecks during 
glacial cycles (Guiher and Burbrink 2008), and the likeli- 
hood of large and stable effective populations in tropical 
lineages such as the python also contraindicates a primary 
role for population size differences as a sole explanation. 

Selection to maintain a smaller genome size has been 
hypothesized numerous times in relation to extreme 
metabolic demands in flighted birds (Hughes and Hughes 
1995), although there is some controversy (Organ et al. 
2007). Previous studies have suggested that extreme met- 
abolic demand in snakes (Secor and Diamond 1995, 1998) 
has resulted in selection to decrease their mitochondrial 
genome size (Jiang et al. 2007), extensive evolutionary 
redesign (Castoe et al. 2008), and previously unprece- 
dented molecular convergence in snake metabolic 
proteins (Castoe et al. 2009b). It is therefore plausible that 
selection related to metabolic demands could have shaped 
snake nuclear genomes. Broader understanding of 
genomic repeat landscapes in snakes may shed greater 
light on this question. There are a range of alternative 
theories about the evolution of genome size and complexity 
(Lynch and Conery 2003), and thus the role of selection in 
snake genome size and structure is a topic of considerable 
interest. 

It is also an open question whether the biology of snake 
genomes may have contributed to the evolution of their 
extreme phenotypes and adaptations (Secor and Diamond 
1995; Cohn and Tickle 1999; Fry et al. 2006; Castoe et al. 
2008, 2009b; Vonk et al. 2008). Among the most conspic- 
uous adaptations in snakes is that some lineages, including 
the ancestors of the copperhead, have evolved complex 
venom repertoires, largely by duplicating and repurposing 
existing genes to produce deadly toxins. Our evidence 
(supplementary table S10, Supplementary Material online), 
and that of others (Nobuhisa et al. 1 998; Fujimi et al. 2002; 
Ikeda et al. 2010), shows a tentative association between 
CR1 LINEs and venom genes. Our genomic sampling sug- 
gests that CR1 LINEs (as well as SSRs and other TEs) have 
expanded substantially in the copperhead lineage, and this 
expansion might be expected to lead to increased rates of 
recombination, unequal crossing over, and gene conver- 
sion (Witherspoon et al. 2009; Stevison and Noor 2010). 
These events could have, at least in part, facilitated the 
expansion and regulatory rewiring of venom gene families 
in venomous snakes. 

Supplementary Material 

Supplementary methods, tables S1-S1 1 , and figures S1-S1 0 
are available at Genome Biology and Evolution online 
(http://www.oxfordjournals.org/our_journals/gbe/). 
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